Skip to main content

Weaviate

Overview

Weaviate is a vector database for AI applications available as a cloud service. More information can be found at https://weaviate.io/ Information on querying Weaviate can be found at https://weaviate.io/developers/weaviate/tutorials/query. Excellent tutorial content can be found at https://weaviate.io/developers/academy.

Qarbine uses Weaviate’s native vector database query features. Weaviate's query interface is based on GraphQL. You can query Weaviate using one or a combination of a semantic (i.e. vector) search and a lexical (i.e. scalar) search. The former is for ‘similar’ oriented searches while the latter more traditional matching techniques. Weaviate offers a variety of parameters to control how the comparison is done and what properties are returned. Below is an example of a movie object stored in Weaviate.

{
"metadata": {
"distance": 0.17359280586242676, "certainty": 0.9132035970687866
},
"properties": {
"worst_rating": 1,
"director": "indar dzhendubaev",
"review_date": "12/20/2018",
"duration": "1H50M",
"url": "https://www.imdb.com/title/tt4057376/",
"title": "on - drakon",
"best_rating": 10,
"genres": "Adventure,Fantasy,Romance",
"actors": "matvey lykov,mariya poezzhaeva,stanislav lyubshin,pyotr romanov",
"keywords": "dragon,dragonslayer,uncharted island,wedding,3d",
"rating_value": 6.9,
"review_body": "Beautiful in every way. ...",
"movie_id": 44226,
"poster_link": "https://m.media-amazon.com/images..",
"description": "On - drakon is a movie….",
"date_published": "12/3/2015",
"rating_count": 4136,
"review_aurthor": "botfish-1"
},
"vector": [ 1,2,....],
"uuid": "c0b5e5dd-70da-4a86-90e4-66a84f2efe7c"
}

It is from a sample dataset used in a Weaviate movie search engine example found at
https://github.com/weaviate/weaviate-examples/tree/main/movies-search-engine. For your convenience the folder ˜/qarbine.service/sample/weaviate has a python file to load the data.

Unlike the strictly columnar result rows found in SQL databases, the matching Weaviate objects are returned as nested JSON objects. In the example metadata and properties are fields that contain other fields. The properties field may in turn contain further nested values.

To specify a Weaviate query in Qarbine the structure is similar to that of a JSON object. Below is an example to retrieve up to 3 movies that have some similarity to “dracula”.

{
"collection": "movies",
"limit": 3,
"nearText": "dracula"
}

This is a simple example. Weaviate allows the near text string argument to be a phrase. Qarbine can easily analyze this deeply nested data and format an analysis report. This interaction can also be embedded into applications for a seamless end user experience. Sample output is shown below

  

This report result can then be exported into various popular formats and easily shared within leading collaboration tools.

Defining a Data Source

Overview

A Data Source is a Qarbine component responsible for retrieving data from somewhere. At a high level it has a name, a description and some arbitrary query string which when sent to the associated Qarbine Data Service endpoint returns some data. The overall execution flow for an analysis, including the optional prompt component, is shown below.

  

A single data source can be referenced by name from multiple Qarbine template components. This enables a single point of change when perhaps, an index is added, or some other query tweak is necessary. The alternative is to attempt to find all templates impacted by a schema or index change for example. This component reusability is especially beneficial when team members have varying roles and skills.

Weaviate Query Specification

The most flexible means to specify a Weaviate query uses a JSON oriented specification. An example is shown below.

{
"collection": "movies",
"limit": 3,
"nearText": "dracula",
"returnProperties" : ["title", "duration", "poster_link",
"url", "rating_value", "actors", "date_published" ],
}

This query retrieves up to 3 movies similar to “dracula” and returns the given set of properties. A complete guide to querying Weaviate from Qarbine is found in the “Weaviate Interactions” document. Much more complex queries can be done which leverage Weaviate’s vector database capabilities. This includes a SQL like interface which covers most of the Weaviate features and in a much easier authoring environment.

A sample result is shown below.

  

The details of that element are shown below

  

Notice that Qarbine automatically pulls up some of the nested fields. This flattening occurs at only one level. Recall that the raw properties from Weaviate may contain other nested values. Qarbine is perfectly fine with this data structure and even with those that are dynamic.

The query specification can be saved in the shared catalog as a data source named “Retrieve 3 dracula movies”.

Managing Answer Set Size

The default maximum number of rows starts off at 25 for a new data source. This is useful to evolve a query from a concept to one that you have verified returns the desired answer set. As noted, any native way of limiting an answer set size is the preferred approach. This setting is in the component dialog as shown below and also accessible by clicking the ‘Gear’ icon.

  

Once you are done drafting you can adjust this parameter. A “0” indicates there is no maximum. A number greater than 0 indicates to limit the final answer set size to that number of rows. This answer set truncation comes after any native query limit. So, if the answer set from the data endpoint is quite large, that content has to be returned to the Qarbine host. It then may truncate the number of rows. It is best to truncate at the query level (i.e., use a limit) to reduce the content sent from the data endpoint to the Qarbine host in the first place.

Adjusting Maximum Rows

Recall the default maximum rows at the component level is 25. When you are satisfied with your query you can change that setting by clicking.

  

Adjust the setting to “0” indicating no Qarbine answer set truncation.

  

Click

  

Defining an Analysis Template

Overview

A template defines how to process the data being retrieved from Data Source queries and other data expressions. It also defines formulas, formatting options, and other analysis and presentation options. The overall execution flow for an analysis, including the optional prompt component, is shown below

  

In this example we will discuss how the output showing 2 of the 3 movies below was obtained through a Qarbine template.

  

Main Properties

Open the Template Designer.

First, associate the Data Source with the Template.
Next, click

  

Enter a name

  

Choose the data source by clicking the recents button

  

Select the component

  

Click

  

The result is

  

Close the property dialog by clicking

  

The right hand side of the Template Designer will show any meta data about the data source data. (There must be no cell chosen in the grid area for this to appear). In this case it shows the movie element structure from the main data source above.

  

The eventual template is shown below.

  

The 1.1.1 Body cells on the first line are described below.

  

The ‘#’ indicates a field of the current element. In this case it is a movie element. The second body line has the description and a custom cell for the poster image.

  

On the right side of the template the image has its multi-line height set as shown below.

  

In the output, clicking the post opens up an interactive image viewer.

The third body line has the following cells. The first one is a custom cell button which opens up a web tab on the movies IMDB listing using the URL value.

  

When running up to 3 movies are retrieved. For each movie the body cells are formatted per the specification above

This template can be saved in the catalog as a component named “Report with 3 dracula movies”.

Increasing Component Flexibility

Updated Data Source

The original data source had hard coded user input phrase of “dracula”. We can define a new data source which uses a variable rather than that hard coded value.

Load the previous data source and adjust the query specification as shown below.

{
"collection": "movies",
"limit": 3,
"nearText": [! @similarToWhat !],
returnProperties: ["title", "duration", "poster_link",
"url", "rating_value", "actors", "date_published" ],
}

The “[!... !]” sequence acts as a placeholder in the query for a macro language expression. The “@similarToWhat” indicates a variable replacement. Running the data source presents a dialog to obtain the variable value.

  

Enter a phrase into the entry field and click “OK”. The effective query with the variable replaced by the input is then run and the results presented.

Click    (Save as) and give it the name “Retrieve 3 @similarToWhat movies”.

Template

Load the “Report with 3 dracula movies” template.

Click    (Save as) and give it the name “Report with 3 @similarToWhat movies”.

Click   to open the properties dialog.

Click    as noted below to choose a recent data source.

Select the row highlighted below.

  

Click

  

The dialog now shows

  

To close the dialog click

  

Save the updated template by clicking

  

Prompt Integration

Overview

Qarbine prompts provide a way to obtain runtime values and variables for data source and template execution. To avoid hardcoding, prompts can use macro formulas to run queries which populate list widgets. Prompts are defined in a no code manner using the Prompt Designer. Shown below is the execution flow when there is a Prompt component.

  

The Prompt Designer supports a large variety of input widgets including entry fields, check boxes, radio button groups, sliders, and file input.

Example

One possible use would be to prompt the user to describe the movies similarity criteria to use for the above report to run. A sample is shown below.

  

Open the Prompt Designer. This can be used to prompt for multiple values. The Prompt Designer provides a large variety of widgets to choose from in a no-code fashion to create dialog prompts. This Prompt has 2 elements (a heading and a text input) as shown below.

  

The high level properties of the heading are shown below.

  

Notice the image URL can be a macro language expression and not just a simple string. The high level properties of the text input element are shown below.

  

This prompt element sets a value for the ‘similarToWhat’ variable.

To save this Prompt component click

  

Give it the name “Search for @similarToWhat”.

Creating a Dynamic Template

Go back to the “Report with 3 @similarToWhat movies” template. If necessary open the Template Designer on that recent component.

Click   to open the properties dialog.

Click the 2nd tab as shown below.

  

In the drop down choose the option shown below and then click the far right recents icon.

  

Choose the recently saved prompt.

  

To close the dialog click

  

The dialog now shows

  

To close the dialog click

  

Adjust the report header as shown below so that it includes the uchosen.

  

Save the template by clicking

  

To run the template click

  

The prompt is presented for what year to use.

  

Click

  

The report result is shown.

  

This is a basic example of using dynamic Weaviate querying and its nested JSON objects within an analytic reporting environment. When Qarbine is embedded into applications, the runtime similarToWhat variable could be fed from the application, rather than through a Qarbine prompt.

Next Steps

Accessing Your Database

To configure access to your database see the guides at

Querying Your Database

For database specific interaction guides navigate to

References

For more information see https://weaviate.io/developers/weaviate/search/filters.